{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Sector Neutral (Solution)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install packages"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import sys"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!{sys.executable} -m pip install -r requirements.txt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cvxpy as cvx\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import time\n",
    "import os\n",
    "import quiz_helper\n",
    "import matplotlib.pyplot as plt"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "plt.style.use('ggplot')\n",
    "plt.rcParams['figure.figsize'] = (14, 8)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## following zipline bundle documentation\n",
    "\n",
    "http://www.zipline.io/bundles.html#ingesting-data-from-csv-files"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### data bundle"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import quiz_helper\n",
    "from zipline.data import bundles"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "os.environ['ZIPLINE_ROOT'] = os.path.join(os.getcwd(), '..', '..','data','module_4_quizzes_eod')\n",
    "ingest_func = bundles.csvdir.csvdir_equities(['daily'], quiz_helper.EOD_BUNDLE_NAME)\n",
    "bundles.register(quiz_helper.EOD_BUNDLE_NAME, ingest_func)\n",
    "print('Data Registered')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Build pipeline engine"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline import Pipeline\n",
    "from zipline.pipeline.factors import AverageDollarVolume\n",
    "from zipline.utils.calendars import get_calendar\n",
    "\n",
    "universe = AverageDollarVolume(window_length=120).top(500) \n",
    "trading_calendar = get_calendar('NYSE') \n",
    "bundle_data = bundles.load(quiz_helper.EOD_BUNDLE_NAME)\n",
    "engine = quiz_helper.build_pipeline_engine(bundle_data, trading_calendar)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### View Data¶\n",
    "With the pipeline engine built, let's get the stocks at the end of the period in the universe we're using. We'll use these tickers to generate the returns data for the our risk model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "universe_end_date = pd.Timestamp('2016-01-05', tz='UTC')\n",
    "\n",
    "universe_tickers = engine\\\n",
    "    .run_pipeline(\n",
    "        Pipeline(screen=universe),\n",
    "        universe_end_date,\n",
    "        universe_end_date)\\\n",
    "    .index.get_level_values(1)\\\n",
    "    .values.tolist()\n",
    "    \n",
    "universe_tickers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Get Returns data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.data.data_portal import DataPortal\n",
    "\n",
    "data_portal = DataPortal(\n",
    "    bundle_data.asset_finder,\n",
    "    trading_calendar=trading_calendar,\n",
    "    first_trading_day=bundle_data.equity_daily_bar_reader.first_trading_day,\n",
    "    equity_minute_reader=None,\n",
    "    equity_daily_reader=bundle_data.equity_daily_bar_reader,\n",
    "    adjustment_reader=bundle_data.adjustment_reader)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Get pricing data helper function"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_pricing(data_portal, trading_calendar, assets, start_date, end_date, field='close'):\n",
    "    end_dt = pd.Timestamp(end_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')\n",
    "    start_dt = pd.Timestamp(start_date.strftime('%Y-%m-%d'), tz='UTC', offset='C')\n",
    "\n",
    "    end_loc = trading_calendar.closes.index.get_loc(end_dt)\n",
    "    start_loc = trading_calendar.closes.index.get_loc(start_dt)\n",
    "\n",
    "    return data_portal.get_history_window(\n",
    "        assets=assets,\n",
    "        end_dt=end_dt,\n",
    "        bar_count=end_loc - start_loc,\n",
    "        frequency='1d',\n",
    "        field=field,\n",
    "        data_frequency='daily')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## get pricing data into a dataframe"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "returns_df = \\\n",
    "    get_pricing(\n",
    "        data_portal,\n",
    "        trading_calendar,\n",
    "        universe_tickers,\n",
    "        universe_end_date - pd.DateOffset(years=5),\n",
    "        universe_end_date)\\\n",
    "    .pct_change()[1:].fillna(0) #convert prices into returns\n",
    "\n",
    "returns_df"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Sector data helper function\n",
    "We'll create an object for you, which defines a sector for each stock.  The sectors are represented by integers.  We inherit from the Classifier class.  [Documentation for Classifier](https://www.quantopian.com/posts/pipeline-classifiers-are-here), and the [source code for Classifier](https://github.com/quantopian/zipline/blob/master/zipline/pipeline/classifiers/classifier.py)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline.classifiers import Classifier\n",
    "from zipline.utils.numpy_utils import int64_dtype\n",
    "class Sector(Classifier):\n",
    "    dtype = int64_dtype\n",
    "    window_length = 0\n",
    "    inputs = ()\n",
    "    missing_value = -1\n",
    "\n",
    "    def __init__(self):\n",
    "        self.data = np.load('../../data/project_4_sector/data.npy')\n",
    "\n",
    "    def _compute(self, arrays, dates, assets, mask):\n",
    "        return np.where(\n",
    "            mask,\n",
    "            self.data[assets],\n",
    "            self.missing_value,\n",
    "        )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sector = Sector()\n",
    "sector"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "len(sector.data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sector.data"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 1\n",
    "How many unique sectors are in the sector variable?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 1\n",
    "There are 11 sector categories.\n",
    "-1 represents missing values.  There are categories 0 to 10"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(f\"set of unique categories: {set(sector.data)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create an alpha factor based on momentum\n",
    "\n",
    "We want to calculate the one-year return.  \n",
    "In other words, get the close price of today, minus the close price of 252 trading days ago, and divide by that price from 252 days ago.\n",
    "\n",
    "$1YearReturn_t = \\frac{price_{t} - price_{t-252}}{price_{t-252}}$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from zipline.pipeline.factors import Returns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## We'll use 2 years of data to calculate the factor"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Note:** Going back 2 years falls on a day when the market is closed. Pipeline package doesn't handle start or end dates that don't fall on days when the market is open. To fix this, we went back 2 extra days to fall on the next day when the market is open."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "factor_start_date = universe_end_date - pd.DateOffset(years=2, days=2)\n",
    "factor_start_date"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## 1 year returns can be the basis for an alpha factor\n",
    "p1 = Pipeline(screen=universe)\n",
    "rets1 = Returns(window_length=252, mask=universe)\n",
    "p1.add(rets1,\"1YearReturns\")\n",
    "df1 = engine.run_pipeline(p1, factor_start_date, universe_end_date)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#graphviz lets us visualize the pipeline\n",
    "import graphviz"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p1.show_graph(format='png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## View the data of the factor"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df1.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explore the demean function\n",
    "\n",
    "The Returns class inherits from zipline.pipeline.factors.factor.  \n",
    "[The documentation for demean is located here](https://www.zipline.io/appendix.html#zipline.pipeline.factors.Factor.demean), and is also pasted below:\n",
    "\n",
    "```\n",
    "demean(mask=sentinel('NotSpecified'), groupby=sentinel('NotSpecified'))[source]\n",
    "Construct a Factor that computes self and subtracts the mean from row of the result.\n",
    "\n",
    "If mask is supplied, ignore values where mask returns False when computing row means, and output NaN anywhere the mask is False.\n",
    "\n",
    "If groupby is supplied, compute by partitioning each row based on the values produced by groupby, de-meaning the partitioned arrays, and stitching the sub-results back together.\n",
    "\n",
    "Parameters:\t\n",
    "mask (zipline.pipeline.Filter, optional) – A Filter defining values to ignore when computing means.\n",
    "groupby (zipline.pipeline.Classifier, optional) – A classifier defining partitions over which to compute means.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 2\n",
    "\n",
    "By looking at the documentation, and then the source code for `demean`, what are two parameters for this function?  Which one or ones would you call if you wanted to demean by sector and wish to demean for all values in the chosen universe?\n",
    "\n",
    "[The source code](https://www.zipline.io/_modules/zipline/pipeline/factors/factor.html#Factor.demean) has useful comments to help you answer this question."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 2\n",
    "\n",
    "We would use the groupby parameter, and we don't need to use the mask parameter, since we are not going to exclude any of the stocks in the universe from the demean calculation.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 3\n",
    "Turn 1 year returns into an alpha factor\n",
    "\n",
    "We can do some processing to convert our signal (1 year return) into an alpha factor. One step is to demean by sector.\n",
    "\n",
    "* demean\n",
    "For each stock, we want to take the average return of stocks that are in the same sector, and then remove this from the return of each individual stock."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#TODO\n",
    "# create a pipeline called p2\n",
    "p2 = Pipeline(screen=universe)\n",
    "# create a factor of one year returns, deman by sector\n",
    "factor_demean_by_sector = (\n",
    "    Returns(window_length=252, mask=universe).\n",
    "    demean(groupby=Sector()) #we use the custom Sector class that we reviewed earlier\n",
    ")\n",
    "# add the factor to the p2 pipeline\n",
    "p2.add(factor_demean_by_sector, 'Momentum_1YR_demean_by_sector')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## visualize the second pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "p2.show_graph(format='png')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Quiz 4\n",
    "How does this pipeline compare with the first pipeline that we created earlier?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Answer 4\n",
    "The second pipeline now adds sector information in the GroupedRowTransform('demean') step."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## run pipeline and view the factor data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2 = engine.run_pipeline(p2, factor_start_date, universe_end_date)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df2.head()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
